Large-Coverage Root Lexicon Extraction for Hindi
نویسندگان
چکیده
This paper describes a method using morphological rules and heuristics, for the automatic extraction of large-coverage lexicons of stems and root word-forms from a raw text corpus. We cast the problem of high-coverage lexicon extraction as one of stemming followed by root word-form selection. We examine the use of POS tagging to improve precision and recall of stemming and thereby the coverage of the lexicon. We present accuracy, precision and recall scores for the system on a Hindi corpus.
منابع مشابه
A Sentiment Analyzer for Hindi Using Hindi Senti Lexicon
Supervised approaches have proved their significance in sentiment analysis task, but they are limited to the languages, which have sufficient amount of annotated corpus. Hindi is a language, which is spoken by 4.70% of the world population, but it lacks a sufficient amount of annotated corpus for natural language processing tasks such as Sentiment Analysis (SA). With the increase in demand and ...
متن کاملAutomatic Generation of Compound Word Lexicon for Hindi Speech Synthesis
This paper addresses the problem of Hindi compound word splitting and its relevance to developing a good quality phonetizer for Hindi Speech Synthesis. The constituents of a Hindi compound word are not separated by space or hyphen. Hence, most of the existing compound splitting algorithms can not be applied to Hindi. We propose a new technique for automatic extraction of compound words from Hin...
متن کاملAutomatic Generation of Multilingual Lexicon by Using Wordnet
A lexicon is the heart of any language processing system. Accurate words with grammatical and semantic attributes are essential or highly desirable for any applicationbe it machine translation, information extraction, various forms of tagging or text mining. However, good quality lexicons are difficult to construct requiring enormous amount of time and manpower. In this paper, we present a meth...
متن کاملHindi Subjective Lexicon : A Lexical Resource for Hindi Polarity Classification
With recent developments in web technologies, percentage web content in Hindi is growing up at a lighting speed. This information can prove to be very useful for researchers, governments and organization to learn what’s on public mind, to make sound decisions. In this paper, we present a graph based wordnet expansion method to generate a full (adjective and adverb) subjective lexicon. We used s...
متن کاملA Generative Model of a Pronunciation Lexicon for Hindi
Voice browser applications in Text-toSpeech (TTS) and Automatic Speech Recognition (ASR) systems crucially depend on a pronunciation lexicon. The present paper describes the model of pronunciation lexicon of Hindi developed to automatically generate the output forms of Hindi at two levels, the and the (PS, in short for Prosodic Structure). The latter level involves both syllable-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009